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Abstract 

The interplay between mutation and selection plays a fundamental role 
in the behaviour of evolutionary algorithms (EAs). However, this inter- 
play is still not completely understood. This paper presents a rigorous 
runtime analysis of a non-elitist population-based EA that uses the linear 
ranking selection mechanism. The analysis focuses on how the balance be- 
tween parameter rj, controlling the selection pressure in linear ranking, and 
parameter x controlling the bit-wise mutation rate, impacts the runtime 
of the algorithm. The results point out situations where a correct balance 
between selection pressure and mutation rate is essential for finding the 
optimal solution in polynomial time. In particular, it is shown that there 
exist fitness functions which can only be solved in polynomial time if the 
ratio between parameters rj and x is within a narrow critical interval, and 
where a small change in this ratio can increase the runtime exponentially. 
Furthermore, it is shown quantitatively how the appropriate parameter 
choice depends on the characteristics of the fitness function. In addition 
to the original results on the runtime of EAs, this paper also introduces 
a very useful analytical tool, i.e., multi-type branching processes, to the 
runtime analysis of non-elitist population-based EAs. 

1 Introduction 

Evolutionary algorithms (EAs) have been applied successfully to many opti- 
misation problems [53]. However, despite several decades of research, many 
fundamental questions about their behaviour remain open. One of the central 
questions regarding EAs is to understand the interplay between the selection 
mechanism and the genetic operators. Several authors have suggested that EAs 
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must find a balance between maintaining a sufficiently diverse population to 
explore new parts of the search space, and at the same time exploit the cur- 
rently best found solutions by focusing the search in this direction [5J UHl [S] ■ 
In fact, the trade-off between exploration and exploitation has been a common 
theme not only in evolutionary computation, but also in operations research 
and artificial intelligence in general. However, few theoretical studies actually 
exist that explain how to define such trade-off quantitatively and how to achieve 
it. Our paper can be regarded as one of the first rigorous runtime analyses of 
EAs that addresses the interaction between exploration, driven by mutation, 
and exploitation, driven by selection. 

Much research has focused on finding measures to quantify the selection 
pressure in selection mechanisms — without taking into account the genetic op- 
erators — and subsequently on investigating how EA parameters influence these 
measures [SJ [U |3 123 12 . One such measure, called the take-over time, considers 
the behaviour of an evolutionary process consisting only of the selection step, 
and no crossover or mutation operators [HI H] ■ Subsequent populations are pro- 
duced by selecting individuals from the previous generation, keeping at least 
one copy of the fittest individual. Hence, the population will after a certain 
number of generations only contain those individuals that were fittest in the 
initial population, and this time is called the take-over time. A short take-over 
time corresponds to a high selection pressure. Other measures of selection pres- 
sure consider properties of the distribution of fitness values in a population that 
is obtained by a single application of the selection mechanism to a population 
with normally distributed fitness values. One of these properties is the selection 
intensity, which is the difference between the average population fitness before 
and after selection |25j . Other properties are loss of diversity [21 |2Q] and higher 
order cumulants of the fitness distribution [3]. 

To completely understand the role of selection mechanisms, it is necessary 
to also take into account their interplay with the genetic operators. There exist 
few rigorous studies of selection mechanisms when used in combination with ge- 
netic operators. Happ et al. considered fitness proportionate selection, which is 
one of the first selection mechanisms to be employed in evolutionary algorithms 
[TT] . Early research in evolutionary computation pointed out that this selection 
mechanism suffers from various deficiencies, including population stagnation 
due to low selective pressure [2^. Indeed, the results by Happ et al. show that 
variants of the RLS and the (1+1) EA that use fitness-proportional selection 
have exponential runtime on the class of linear functions [llj . Their analysis was 
limited to single-individual based EAs. Neumann et al. showed that even with 
a population-based EA, the OneMax problem cannot be optimised in polyno- 
mial time with fitness proportional selection |22) . However, they pointed out 
that polynomial runtime can be achieved by scaling the fitness function. Witt 
also studied a population-based algorithm with fitness proportionate selection, 
however with the objective to study the role of populations [JT]. Chen et al. 
analysed the (N+N) EA to compare its runtimes with truncation selection, 
linear ranking selection and binary tournament selection on the LeadingOnes 
and OneMax problems [4]. They found the expected runtime on these fitness 
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functions to be the same for all three selection mechanisms. None of the results 
above show how the balance between the selection pressure and mutation rate 
impacts the runtime. 

This paper analyses rigorously a non-elitist, population based EA that uses 
linear ranking selection and bit-wise mutation. The main contributions are an 
analysis of situations where the mutation-selection balance has an exponentially 
large impact on the runtime, and new techniques based on branching processes 
for analysing non-elitist population based EAs. The paper is based on prelimi- 
nary work reported in |18| . which contained the first rigorous runtime analysis 
of a non-elitist, population based EA with stochastic selection. This paper sig- 
nificantly extends this early work. In addition to strengthening the main result, 
simplifying several proofs and proving a conjecture, we have added a completely 
new section that introduces multi-type branching processes as an analytical tool 
for studying the runtime of EAs. 

1.1 Notation and Preliminaries 

The following notation will be used in the rest of this paper. The length of 
a bitstring x is denoted t{x). The i-th bit, 1 < i < ^(x), of a bitstring x is 
denoted Xi . The concatenation of two bitstrings x and y is denoted hy x ■ y and 
xy. Given a bitstring x, the notation a;[«, j], where 1 < i < j < t{x), denotes 
the substring XiXi^i - ■ ■ Xj. For any bitstring x, define := X^ilfi 
i.e. the fraction of 1-bits in the bitstring. We say that an event holds with 
overwhelmingly high probability (w.o.p.) with respect to a parameter n, if the 
probability of the event is bounded from below by 1 — e"^^"-'. 

In contrast to classical algorithms, the runtime of EAs is usually measured in 
terms of the number of evaluations of the fitness function, and not the number 
of basic operations. For a given function and algorithm, the expected runtime is 
defined as the mean number of fitness function evaluations until the optimum 
is evaluated for the first time. The runtime on a class of fitness functions is 
defined as the supremum of the expected runtimes of the functions in the class 
[7] . The variable name r will be used to denote the runtime in terms of number 
of generations of the EA. In the case of EAs that are initialised with a population 
of A individuals, and which in each generation produce A offspring, variable r 
can be related to the runtime T by A(t — 1) < T < Ar. 

2 Definitions 

2.1 Linear Ranking Selection 

In ranking selection, individuals are selected according to their fitness rank 
in the population. A ranking selection mechanism is uniquely defined by the 
probabilities pi of selecting an individual ranked «, for all ranks i |2J. For 
mathematical convenience, an alternative definition due to Goldberg and Deb 
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[5] is adopted, in which a function a : [0, 1] R is considered a ranking function 
if it is non-increasing, and satisfies the following two conditions 

1. a{x) > 0, and 

2. J^a{y)dy = l. 

Individuals are ranked from to 1, with the best individual ranked 0, and 
the worst individual ranked 1. For a given ranking function a, the integral 
P{x,y) := a{z)dz gives the probability of selecting an individual with rank 
between x and y. By defining the linearly decreasing ranking function a{x) := 
77 — cx, where 77 and c are parameters, one obtains linear ranking selection. The 
first condition implies that rj > c > 0, and the second condition implies that 
c = 2(ry — 1). Hence, for linear ranking selection, we have 

a{x) := 77(1 - 2x) + 2x, and (1) 
I3{x) := I3{0,x) ^ x{r]{l~ x)+x). (2) 

Note that since a is non- increasing, i.e., a' (a;) < 0, we must have rj > 1. Also, 
the special case a(l) > of the first condition implies that ry < 2. The selection 
pressure, measured in terms of the take-over time, is uniquely given by, and 
monotonically decreasing in the parameter rj [9] . The weakest selection pressure 
is obtained for 77 = 1, where selection is uniform over the population, and the 
highest selection pressure is obtained for 77 — 2. We therefore assume that 
1 < 77 < 2. 

2.2 Evolutionary Algorithm 



1 Linear Ranking EA [18 

1: t^O. 

2: for i = 1 to A do 

3: Sample x uniformly at random from {0, 1}". 

4: Po{i) ^ X. 

5: end for 

6: repeat 

7: Sort Pt according to fitness /, such that 

f{Pt{l))>f{Pt{2))>--->f{Pt{X)). 

8: for z = 1 to A do 

9: Sample r in {1, A} with Pr (r < 7A) = /3(7)- 

10: Pt+iii) ^ Ptir). 

11: Flip each bit position in Pj+i(i) with prob. x/n. 

12: end for 

13: t^t+1. 

14: until termination condition met. 



We consider a population-based non-elitist EA which uses linear ranking 
as selection mechanism. The crossover operator will not be considered in this 
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Figure 1: Illustration of optimal search points [18]. 



paper. The pseudo-code of the algorithm is given above. After sampling the 
initial population Pq at random in lines 1 to 5, the algorithm enters its main 
loop where the current population Pt in generation t is sorted according to 
fitness, then the next population Pt+i is generated by independently selecting 
(line 9) and mutating (line 10) individuals from the previous population Pt. 
The analysis of the algorithm is based on the assumption that parameter x is a 
constant with respect to n. 

Linear ranking selection is indicated in line 9, where for a given selection 
pressure r], the cumulative probability of sampling individuals with rank less 
than 7A is /3(7). It can be seen from the definition of the functions a and 
/3, that the upper bound /3(7,7 + S) < S ■ a{j), holds for any 7, ^ > where 
7 -|- (5 < 1. Hence, the expected number of times a uniformly chosen individual 
ranked between 7A and (7 + S)X is selected during one generation is upper 
bounded by (A/(5A) • /3(7, j + S) < a(7). We leave the implementation details of 
the sampling strategy unspecified, and assume that the EA has access to some 
sampling mechanism which draws samples perfectly according to /3. 



2.3 Fitness Function 

Definition 1. For any constants (7,6, < 6 < a < I — 3(5, and integer k > 1, 
define the function 



SELPREScr,5,fc(x) : = 



2n if X E X*, and 

Sr=i rij-^i otherwise, 



where the set of optimal solutions X* is defined to contain all bitstrings x g 
{0, 1}" satisfying 

||a;[l,A: + 3]|| = 0, 
||a;[fc + 4, (cr - (5)n - 1] II = 1, and 
\\x[{a + 5)n, (cr + 25)n - 1]|| < 2/3. 

Except for the set of globally optimal solutions X*, the fitness function 
takes the same values as the well known LeadingOnes fitness function, i.e. 
the number of leading 1-bits in the bitstring. The form of the optimal search 
points, which is illustrated in Fig. [Tl depends on the three problem parameters 
(7, k and 6. The (5-parameter is needed for technical reasons and can be set 
to any positive constant arbitrarily close to 0. Hence, the globally optimal 
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solutions have approximately an leading 1-bits, except for fc + 3 leading 0-bits. 
In addition, globally optimal search points must have a short interval after the 
first an bits which does not contain too many 1-bits. 



3 Main Result 



Theorem 2. For any constant integer k > 1, let T be the runtime of the 
Linear Ranking EA with population size n < X < n^ with a constant selection 
pressure of r],l < rj < 2, and bit-wise mutation rate xl'^j for a constant x > 0, 
on function SelPreSo-.^.A: with parameters a and 6, where 0<S<a<l — 36. 
Let e > be any constant. 

1. If rj < exp(x(o' — 6)) — e, then for some constant c > 0, 



3. Ifr]> (2exp(x(cr + 3^)) - 1)/(1 -^), then 

E[T] = e"("). 

Proof. The theorem follows from Theoreni[Bl Theoremfl^ and Corollary[2Dl □ 

Theorem [2] describes how the runtime of the Linear Ranking EA on fitness 
function SELPRESo-^s.fc depends on the main problem parameters a and k, the 
mutation rate x ^-nd the selection pressure 77. The theorem is illustrated in Fig- 
ure [2] for problem parameter a = 1/2. Each point in the grey area indicates that 
for the corresponding values of mutation rate x and selection pressure 77, the 
EA has either expected exponential runtime or exponential runtime with over- 
whelming probability (i.e. is highly ineflScient). The thick line indicates values 
of X and 77 where the runtime of the EA is polynomial with overwhelmingly high 
probability (i.e. is efficient). The runtime in the white regions is not analysed. 

The theorem and the figure indicate that setting one of the two parameters 
of the algorithm (i. e. 77 or x) independently of the other parameter is insufficient 
to guarantee polynomial runtime. For example, setting the selection pressure 
parameter to 77 := 3/2 only yields polynomial runtime for certain settings of 
the mutation rate parameter x, while it leads to exponential runtime for other 
settings of the mutation rate parameter. Hence, it is rather the balance between 
the mutation rate x and the selection pressure 77, i. e. the mutation- selection bal- 
ance, that determines the runtime for the Linear Ranking EA on this problem. 
More specifically, a too high setting of the selection pressure parameter rj can 
be compensated by increasing the mutation rate parameter x- Conversely, a too 
low parameter setting for the mutation rate x can be compensated by decreas- 
ing the selection pressure parameter 77. Furthermore, the theorem shows that 



Pr (T > e"") 
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2. // 7; = exp(xo'), then 
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Figure 2: Illustration of the main result (Theorem [2]), indicating the runtime of 
the EA on SELPRESo-.5,fc for problem parameter a = 1/2, as a function of the 
mutation rate x (horizontal axis) and the selection pressure r] (vertical axis). 

the runtime can be highly sensitive to the parameter settings. Notice that the 
margins between the different runtime regimes are determined by the two pa- 
rameters e and S that can be set to any constants arbitrarily close to 0. Hence, 
decreasing the selection pressure below exp(xcr) by any constant, or increasing 
the mutation rate above ln(?7)/fT by any constant, will increase the runtime from 
polynomial to exponential. Finally, note that the optimal mutation-selection 
balance 77 — exp(x(T) depends on the problem parameter a. Hence, there exists 
no problem-independent optimal balance between the selection pressure and the 
mutation rate. 

Before proving Theorem [21 we mention that also previous analyses have 
shown that the runtime of randomised search heuristics can depend critically 
on the parameter settings. In the case of EAs, it is known that the population 
size is important |T2j [151 130] . In fact, even small changes to the population size 
can lead to an exponential increase in the runtime \27\ 131] . Another example is 
the evaporation factor in Ant Colony Optimisation, where a small change can 
increase the runtime from polynomial to exponential |23( [3 [6] . A distinguishing 
aspect of the result in this paper is that the runtime is here shown to depend 
critically on the relationship between two parameters of the algorithm. 

4 Runtime Analysis 

This section gives the proofs of Theorem[21 The analysis is conceptually divided 
into two parts. In Sections 14.11 and | 4 . 2 [ the behaviour of the main "core" of the 
population is analysed, showing that the population enters an equilibrium state. 
This analysis is sufficient to prove the polynomial upper bound in Theorem [21 
Sections 14.31 and [4.41 analvse the behaviour of the "stray" individuals that some- 
times move away from the core of the population. This analysis is necessary to 
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Figure 3: Impact of one generation of selection and mutation from the point of 
view of the 7-ranked individual in population Pt [18j . 



prove the exponential lower bound in Theorem [21 
4.1 Population Equilibrium 

As long as the global optimum has not been found, the population is evolving 
with respect to the number of leading 1-bits. In the following, we will prove that 
the population eventually reaches an equilibrium state in which the population 
makes no progress with respect to the number of leading 1-bits. The population 
equilibrium can be explained informally as follows. On one hand, the selection 
mechanism increases the number of individuals in the population that have a 
relatively high number of leading 1-bits. On the other hand, the mutation op- 
erator may flip one of the leading 1-bits, and the probability of doing so clearly 
increases with the number of leading 1-bits in the individual. Hence, the selec- 
tion mechanism causes an influx of individuals with a high number of leading 
1-bits, and the mutation causes an efflux of individuals with a high number of 
leading 1-bits. At a certain point, the influx and efflux reach a balance which 
is described in the field of population genetics as mutation-selection balance. 

Our first goal will be to describe the population when it is in the equilibrium 
state. This is done rigorously by considering each generation as a sequence 
of A Bernoulli trials, where each trial consists of selecting an individual from 
the population and then mutating that individual. Each trial has a certain 
probability of being successful in a sense that will be described later, and the 
progress of the population depends on the sum of successful trials, i.e. the 
population progress is a function of a certain Bernoulli process. 
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4.1.1 Ranking Selection as a Bernoulli Process 

We will associate a Bernoulli process with the selection step in any given genera- 
tion of the non-elitist EA, similar to Chen et al. 0]. For notational convenience, 
the individual that has rank 7A in a given population, will be called the 7-ranked 
individual of that population. For any constant 7, < 7 < 1, assume that the 7- 
ranked individual has /o := leading 1-bits for some constant ^. As illustrated 
in Fig. 131 the population can be partitioned into three groups of individuals: A"*"- 
individuals with fitness higher than /o, A^-individuals with fitness equal to /o, 
and A "-individuals with fitness less than /q. Clearly, A^ -I- A° -I- A^ = A, and 
< A+ < 7A. 

The following theorem makes a precise statement about the position ^* = 
ln(/3(7) /^)/x foi' ^ given rank 7, < 7 < 1 , in which the population equilibrium 
occurs. Informally, the theorem states that the number of leading 1-bits in the 
7-ranked individual is unlikely to decrease when it is below and is unlikely 
to increase, when it is above S^*n. 

Theorem 3. For any constant 7, < 7 < 1, and any to > 0, define for allt> 1 
the random variable Lt as the number of leading 1-bits in the ^-ranked individual 
in generation + t. For any i < e^^, define T* := min{t, T — to}, where T is 
the number of generations until an optimal search point is found. Furthermore, 
for any constant mutation rate x > 0, define ^* :— ln(/3(7)/7) /x, where the 
function /3(7) is as given in Eq. (0). Then for any constant (5, < (5 < it 
holds that 

e-^2(A) 

where c > is some constant. 

Proof. For the first statement, define ^ := minj^OjC* ~ <^}- Consider the events 
J~ and QJ , defined for j, < j < t, hy 

Tl' : Lj+i < ^n, and : min Li > ^n. 

■' •' 0<i<j 

The first probability in the theorem can now be expressed as 

Pr (Uo<j<T-^7 A gr \ Lo > Can) 
t-1 

<^Pr(j-rAg7 |Lo>M 

j=0 

t-1 

<^Pr(j-r |g-ALo>M' 
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Pr f minj^o'^, (C ~ ^)n} > Li \ Lq > Con 



Pr ( max{^o"-7 iC* + S)n} < niax Li \ Lq < Can 



where the first inequahty follows from the union bound. The second inequality 
follows from the definition of conditional probability, which is well-defined in 
this case because Pr (^Qj^ \ Lq > ^o«) > clearly holds. 

To prove the first statement of the theorem, it now suffices to choose a not 
too large constant c, and show that for all j, < j < t, 



Pr ( J-r I g- A Lo > Con) 



e 



To show this, we consider each iteration of the selection mechanism in gen- 
eration j as a Bernoulli trial, where a trial is successful if the following event 
occurs: 

An individual with at least leading 1-bits is selected, and none of the 
initial bits are flipped. 

Let the random variable X denote the number of successful trials. Notice that 
the event X > 7A implies that the 7-ranked individual in the next generation 
has at least £,n leading 1-bits, i.e., that event T~ does not occur. From the 
assumption that ^ < \n{l3{j)/j)/x ~ S, we get 

— > ^ . e*^ 

Hence it follows that 

E [X I g- ALo> ^on] = A • Pr {£+ \ g- A Lq > ^o") 

>/3(,)A.(l-^) (l-Jp 

> ^(7)A • (1 - ^) ■ e^^^ 

> 7A • (1 - ^) • e^'' 
>7A-(l+<5x)- (1-^) . 

For sufficiently large n, a Chernoff bound [H] therefore implies that 

Pr (X < 7A I gj ALo> ^on) - e-"(^). 

For the second statement, define ^ := max{^o,C* + S}- Consider the events 
J^j' and gj', defined for j, < j < i, by 

J^l' : Lj+i > ^n, and gj' : min Li < ^n. 

■' •' 0<i<j 

Similarly to above, the second statement can be proved by showing that 

Pr I g+ ALo< Con) = g-^^^' 

for all j, < J < t. To show this, we define a trial in generation j successful if 
one of the following two events occurs: 
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£2 '■ An individual with at least + 1 leading 1-bits is selected, and none of 

the initial + 1 bits arc flipped. 

£2- An individual with less than + 1 leading 1-bits is selected, and the 
mutation of this individual creates an individual with at least + 1 
leading 1-bits. 

Let the random variable Y denote the number of successful trials. Notice that 
the event F < 7A implies that the 7-ranked individual in the next generation has 
no more than leading 1-bits, i. e., that event J^j' does not occur. Furthermore, 
since the 7-ranked individual in the current generation has no more than 
leading 1-bits, less than 7A individuals have more than leading 1-bits. Hence, 
the event £2 occurs with probability 

Pr {£+ I A Lo < 6n) < ^ (l - |)'"^' < 

If the selected individual has fc > 1 0-bits within the first + 1 bit positions, 
then the probability of mutating this individual into an individual with at least 
+ 1 leading 1-bits, and hence also the probability of event £2 , is bounded 
from above by 

Pr(.,-|a;ALo<M<g)'^(l-^)'"'"'^< J^- 
From the assumption that ^ > ln(/3(7)/7)/x -|- 6, we get 

J_< JlL.e-*^^ 

Hence, for any constant 5' ,0 < S' < 1 — e~^x < 1^ -^q have 

E [Y I g+ ALo< Con] = A • Pr {£+ \ gf A Lo < ^on) 

+ A • Pr {£2 I ALo< ^on) 
< A (^(7) + ^) • e-^^ 

For sufhciently large n, a Chernoff bound therefore implies that 
Pr (F > 7A I gf ALo< ^on) = e'^^^). 

□ 

In the following, we will say that the 7-ranked individual x is in the equilib- 
rium position with respect to a given constant 6 > 0, if the number of leading 
1-bits in individual x is larger than (^* — S)n, and smaller than (^* -|- 6)n, where 
r = ln(/3(7)/7)/X- 
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4.1.2 Drift Analysis in Two Dimensions 

Theoreni[3]states that when the population reaches a certain region of the search 
space, the progress of the population will halt and the EA enters an equilibrium 
state. Our next goal is to calculate the expected time until the EA enters the 
equilibrium state. More precisely, for any constants 7, < 7 < 1 and S > 0, 
we would like to bound the expected number of generations until the fitness /o 
of the 7-ranked individual becomes at least {\n{f3{'~f)/j)/x — 5)n. Although the 
fitness /o will have a tendency to drift towards higher values, it is necessary 
to take into account that the fitness can in general both decrease and increase 
according to stochastic fluctuations. 

Drift analysis has proven to be a powerful mathematical technique to anal- 
yse such stochastically fluctuating processes [13]. Given a distance measure 
(sometimes called potential function) from any search point to the optimum, 
one estimates the drift A towards the optimum in one generation, and bounds 
the expected time to overcome a distance of h{n) by b{n)/A. 

However, in our case, a direct application of drift analysis with respect to /o 
will give poor bounds, because the drift of /o depends on the value of a second 
variable A"*" . The probability of increasing the fltness of the 7-ranked individual 
is low when the number of individuals in the population with higher fitness, i. e. 
A+j is low. However, it is still likely that the sum A" + A"*" will increase, thus 
increasing the number of good individuals in the population. 

Several researchers have discussed this alternating behaviour of population- 
based EAs [30l H] . Witt shows that by taking into account replication of good 
individuals, one can improve on trivial upper runtime bounds for the (yu+l) 
EA, e.g. from 0(/xn^) on LeadingOnes into O {fj,n log n + n'^) [30]. Chen et al. 
describe a similar situation in the case of an elitist EA, which goes through a 
sequence of two-stage phases, where the first stage is characterised by accumu- 
lation of leading individuals, and the second stage is characterised by acquiring 
better individuals 4. 

Generalised to the non-elitist EA described here, this corresponds to first 
accumulation of A+ -individuals, until one eventually gains more than 7A indi- 
viduals with fitness higher than /q. In the worst case, when A+ = 0, one expects 
that /o has a small positive drift. However, when A"*" is high, there is a high 
drift. When the fitness is increased, the value of A+ is likely to decrease. To 
take into account this mutual dependency between A+ and /o, we apply drift 
analysis in conceptually two dimensions, finding the drift of both /o and A^. 
Similar in vein to this two dimensional drift analysis, is the analysis of simulated 
annealing due to Wegener, in which a gambler's ruin argument is applied with 
respect to a potential function having two components [28j. 

The drift analysis applies the following simple property of function /3 which 
follows from its definition in Eq. ([2|). 

Lemma 4. For all a; > 1, and 7,0 < 7 < 1, the function l3 defined in Eq. (0j 
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satisfies 

m - 

The following theorem shows that if the 7-ranked individual in a given pop- 
ulation is below the equilibrium position, then the equilibrium position will be 
reached within expected 0{\tt?) function evaluations. 

Theorem 5. Let 7 and 5 he any constants with < 7 < 1 and (5 > 0. The 
expected number of function evaluations until the ^-ranked individual of the 
Linear Ranking EA with population size A > clnn, for some constant c > 
that depends on 7, attains at least n(ln(/3(7)/7)/x — i5) leading 1-bits or the 
optimum is reached, is 0{Xn^). 

Proof. Recall from the definition of the EA that Pt is the population vector in 
generation t > 0. We consider the drift by to the potential function h{Pt) := 
hy{Pt)+Xhx{Pt), which is composed of a horizontal component h^, and a vertical 
component hy, defined as 

hx{Pt) ■= n - LEADlNGONES(a;(^)), 

hyiPt) :=7A-|{yePt|/(y)>/(x(^))}|, 

where a;(^) is the 7-ranked individual in population Pt. The horizontal Ax,t and 
vertical Ay^t drift in generation t are 

A,,t{i) := E [h,{Pt) - /i,(Pt+i) I h,{Pt) ^ i] , and 

Ay.tit) := E [hy{Pt) - hy{Pt+l) \ hy{Pt) = z] . 

The horizontal and vertical drift will be bounded independently in the following 
two cases, 

1) < A+ < 7A/;, and 

2) 7A/Z < A+, 

where / is a constant that will be specified later. 

Assume that the 7-ranked individual has leading 1-bits, where it holds 
^ < ln(/3(7)/7)/x — 5. By the first statement of Theorem |3l the probability 
of reducing the number of leading 1-bits in the 7-ranked individual, i.e., of 
increasing the horizontal distance, is e"^^'*'-'. The horizontal distance cannot 
increase by more than n, so Ax^t > — rie"^*-"^^ holds in both cases. 

We now bound the horizontal drift A^^t for Case 2. Let the random variable 
St be the number of selection steps in which an individual with fitness strictly 
higher than /q — f{x^^-^) is selected, and none of the leading bits are flipped. 
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Then 



E [St] > A • /3(7/0 • e-i^ 




> 7A 



/3(7/0 / 




> 7A 



(1 + xS) 



I 



By defining I := (1 + x(5/2), there exists a constant S' > such that for suf- 
ficiently large n, we have E [St] > (1 + S') ■ 7A. Hence, by a Chernoff bound, 
with probability 1 — e~^^^\ the number St of such selection steps is at least 7A, 
in which case A^^t > 1- The unconditional horizontal drift in Case 2 therefore 
satisfies A^,t > l - (1 - e-"(^)) - n ■ e-^^^^\ 

We now bound the vertical drift A^^t for Case 1. In order to generate a A+- 
individual in a selection step, it is sufficient that a A^-individual is selected and 
none of the leading + 1 1-bits arc flipped. We first show that the expected 
number of such events is sufficient to ensure a non- negative drift. If A^ = 0, 
then the vertical drift cannot be negative. Let us therefore assume that < 
Xf = ■fX/m for some m > 1 which is not necessarily constant. The expected 
number of times a new A+-individual is created is at least 



Hence, for sufficiently large n, this is at least A^ , and the expected drift is at 
least positive. In addition, a A"'" -individual can be created by selecting a A*^- 
individual, and flipping the flrst 0-bit and no other bits. The expected number 
of such events is at least A • I3(j/l, 7) • e~^^ ■ x/n = i7(A/n). Hence, the expected 
vertical drift in Case 1 is ^l{X/n). Finally, for Case 2, we use the trivial lower 
bound Ay^t > — 7A. 

The horizontal and vertical drift is now added into a combined drift 



Given a population size A > clnn, for a sufficiently large constant c with 
respect to 7, the combined drift is therefore in both cases bounded from 
below by fl{X/n). The maximal distance is b{n) < (n+7)-A, hence, the expected 
number of function evaluations T until the 7-ranked individual attains at least 
n(ln(/3(7)/7)/x-(5) leading 1-bits is no more than E [T] < X-b{n)/At = OiXn^). 



X ■ j3h/m) ■ e-^^ • f 1 - -) > 7A • 




Ay,t + XA: 



which in the two cases is bounded by 

1) At = n{X/n) - Ane-"W, and 

2) At = -7A + A(l - e-^(^)) - Ane-"(^). 



□ 
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4.2 Mutation-Selection Balance 



In the previous section, it was shown that the population reaches an equihbrium 
state in O(An^) function evaluations in expectation. Furthermore, the position 
of the equihbrium state is given by the selection pressure rj and the mutation rate 
X- By choosing appropriate values for the parameters rj and Xj one can ensure 
that the equilibrium position occurs close to the global optimum that is given 
by the problem parameter a. Theorem 1191 that will be proved in Section 14.51 
also implies that no individual will reach far beyond the equilibrium position. 
It is now straightforward to prove that an optimal solution will be found in 
polynomial time with overwhelmingly high probability. 

Theorem 6. The probability that Linear Ranking EA with population size n < 
^ 1^ , for any constant integer k > 1, selection pressure rj, and hit-wise 
mutation rate x/n for a constant x > satisfying rj = exp(CTx); fi-nds the 
optimum of SelPreSo-.5,/c within function evaluations is 1 — e~^(") . 

Proof. We divide the run into two phases. The first phase lasts the first An"^ 
function evaluations, and the second phase lasts the remaining n*'^'^ — Xn^ func- 
tion evaluations. We say that a failure occurs during the run, if within these two 
phases, there exists an individual that has more than {a + S)n leading 1-bits, or 
more than 2nS/3 1-bits in the interval from {a- + S)n to {a + 26)n. We first claim 
that the probability of this failure event is exponentially small. By Theorem ll9l 
no individual reaches more than {a + 6)n leading 1-bits within cn^^"^ function 
evaluations with probability 1 — e"^'-"'. Hence the bits after position (cr + 6)n 
will be uniformly distributed. By a Chernoff bound, and a union bound over all 
the individuals in the two phases, the probability that any individual during the 
two phases has more than 2Sn/3 1-bits in the interval from n{a + 6) to n{a + 2S) 
is exponentially small. We have therefore proved the first claim. 

Let 7 > be a constant such that ln(/3(7)/7)/x > a-—S. We say that a failure 
occurs in the first phase, if by the end of this phase, there exists a non-optimal 
individual with rank between and 7 that has less than (cr — S)n leading 1-bits. 
We will prove the claim that the probability of this failure event is exponentially 
small. By Theorem [SJ the expected number of function evaluations until the 
7-ranked individual has obtained at least (cr — 6)n leading 1-bits is no more 
than cAn^, for some constant c > 0. We divide the first phase into sub-phases, 
each of length 2c\n^ . By Markov's inequality, the probability that the 7-ranked 
individual has not obtained (cr — 5)n leading 1-bits within a given sub-phase is 
less than 1/2. The probability that this number of leading 1-bits is not achieved 
within n/2c such sub-phases, i.e. by the end of the first phase, is no more than 
2-n/2c^ and the second claim holds. 

We say that a failure occurs in the second phase, if a non-optimal individual 
with rank better than 7 has less than (cr — 5)n leading 1-bits, or the optimum 
is not found by the end of the phase. We claim that the probability of this 
failure event is exponentially small. The first part of the claim follows from the 
first part of Theorem [3] with the parameters ^0 = 0"^^ and t = n'^+^/A — n^. 
Assuming no failure in the previous phase, it suffices to select an individual with 
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»x* (global optimum). 



Figure 4: Non-selective family tree (triangle) of the family tree (gray) rooted in 
individual x '18'. 

rank between and 7, and flip the leading fc + 3 1-bits, and no other bits. The 
probability that this event happens during a single selection step, assuming that 
n > 2x — fc — 3, i. e., n — A; — 3 < 2n — 2%, is 



The expected number of selection steps until the optimum is produced is 1/r < 
jpj, gQjjjg constant c' > 0. Similarly to the first phase, we consider sub- 
phases, each of length 2c'nJ'^^ . By Markov's inequality, the probability that the 
optimum has not been found within a given sub-phase is less than 1/2. The 
probability that the optimum has not been found within n/4c' sub-phases, i.e. 
before the end of the second phase, is 2~"/^'^ , and the third claim holds. 

If none of the failure events occurs, then the optimum has been found by the 
end of the second phase. The probability that any of the failure events occurs 
is e"^^"-*, and the theorem then follows. □ 

4.3 Non-Selective Family Trees 

While Theorem [3] describes the equilibrium position of any 7-ranked individual 
for any positive constant 7, the theorem cannot be used to analyse the behaviour 
of single "stray" individuals, including the position of the fittest individual (i. e. 
7 = 0). This is because the tail inequalities obtained by the Chernoff bounds 
used in the proof of Theorem [3] are too weak for ranks of order 7 — o(l). 

To analyse stray individuals, we will apply the technique of non-selective 
family trees introduced in IF. This technique is different from, but related 
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to, the family tree technique described by Witt [30]. A family tree has as its 
root a given individual x in some generation t, and the nodes in each level k 
correspond to the subset of the population in generation t + k defined in the 
following way. An individual y in generation i + fc is a member of the family 
tree if and only if it was generated by selection and mutation of an individual 
z that belongs to level t + k — 1 of the family tree. In this case, individual z is 
the parent node of individual y. If there is a path from an individual z at level 
k to an individual y at level k' > k, then individual y is said to be a descendant 
of individual z, and individual z is an ancestor of individual y. A directed path 
in the family tree is called a lineage. A family tree is said to become extinct in 
generation t + t(n) + 1 if none of the individuals in level t{ri) of the tree were 
selected. In this case, t{n) is called the extinction time of the family tree. 

The idea for proving that stray individuals do not reach a given part of the 
search space can be described informally using Fig. |31 One defines a certain 
subset of the search space called the core within which the majority of the pop- 
ulation is confined with overwhelming probability. In our case, an appropriate 
core can be defined using Theorems [3] and (5] One then focuses on the family 
trees that are outside this core, but which have roots within the core. Note 
that some descendants of the root may re-enter the core. We therefore prune 
the family tree to those descendants which are always outside the core. More 
formally, the pruned family tree contains node x if and only if x belongs to the 
original family tree, and x and all its ancestors are outside the core. 

We would then like to analyse the positions of the individuals that belong to 
the pruned family tree. However, it is non-trivial to calculate the exact shape of 
this family tree. Let the random variable denote the number of offspring of 
individual x. Clearly, the distribution of depends on how x is ranked within 
the population. Hence, different parts of the pruned family tree may grow at 
different rates, which can influence the position and shape of the family tree. To 
simplify the analysis, we embed the pruned family tree into a larger family tree 
which we call the non-selective family tree. This family tree has the same root 
as the real pruned family tree, however it grows through a modified selection 
process. In the real pruned family tree, the individuals have different numbers of 
offspring according to their rank in the population. In the non-selective family 
tree, the offspring distribution of all individuals x is identical to the offspring 
distribution of an individual z which is best ranked among individuals outside 
the core. We will call the expectation of this distribution the reproductive 
rate of the non-selective family tree. Hence, each individual in the non-selective 
family tree has at least as many offspring as in the real family tree. The real 
family tree will therefore occur as a sub-tree in the non-selective family tree. 
Furthermore, the probability that the real family tree reaches a given part of the 
search space is upper bounded by the probability that the non-selective family 
tree reaches this part of the search space. A related approach, where faster 
growing family trees are analysed, is described by Jagerskiipper and Witt [13] ■ 

Approximating the family tree by the non-selective family tree has three im- 
portant consequences. The first consequence is that the non-selective family tree 
can grow faster than the real family tree, and in general beyond the population 
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size A of the original process. The second consequence is that since all individ- 
uals in the family tree have the same offspring distribution, no individual in the 
family tree has any selective advantage, hence the name non-selective family 
tree. The behaviour of the family tree is therefore independent of the fitness 
function, and each lineage fluctuates randomly in the search space according 
to the bits flipped by the mutation operator. Such mutation random walks are 
easier to analyse than the real search process. To bound the probability that 
such a mutation random walk enters a certain region of the search space, it 
is necessary to bound the extinction time t{n) of the non-selective family tree. 
The third consequence is that the sequence of random variables Zt>o describing 
the number of elements in level t of the non-selective family tree is a discrete 
time branching process [10]. We can therefore apply the techniques that have 
been developed to study branching processes to bound the extinction time t(n). 

Before introducing branching processes, we summarise the main steps in a 
typical application of non-selective family trees, assuming the goal is to prove 
that with overwhelming probability, an algorithm does not reach a given search 
point a;* within e'^" generations for some constant c > 0. The first step is 
to define an appropriate core, which is a subset of the search space that is 
separated from x* by some distance. The second step is to prove that any non- 
selective family tree outside the core will become extinct in t{n) generations 
with overwhelmingly high probability. This can be proved by applying results 
about branching processes, e.g. Lemma [5] and Lemma in this paper. The 
third step is to bound the number of different lineages that the family tree 
has within t{n) generations. Again, results about branching processes can be 
applied. The fourth step involves bounding the probability that a given lineage, 
starting inside the core reaches search point x* within t(n) generations. This 
can be shown in various ways, depending on the application. The fifth, and 
final step, is to apply a union bound over all the different lineages that can exist 
within e"^" generations. 

In the second step, one should keep in mind that there are several causes 
of extinction. A reproductive rate less than 1 is perhaps the most evident 
cause of extinction. Such a low reproductive rate may occur when the fitness 
outside the core is lower than the fitness inside the core, as is the case for the 
family trees considered in Section IT4l With a majority of the population inside 
the core, each individual outside the core is selected in expectation less than 
once per generation. However, a low reproductive rate is not the only cause of 
extinction. This is illustrated by the core definition in Section 14.51 where the 
fitness is generally higher outside, than inside the core. While the family tree 
members may in general be selected more than once per generation, the critical 
factor here is that their offspring are in expectation closer to the core than their 
parents. Hence, the lineages outside the core will have a tendency to drift back 
into the core where they are no longer considered part of the family tree due to 
the pruning process. 

Definition 7 (Single- Type Branching Process piP). A single-type branching 
process is a Markov process Zq, Zi, ... on No, which for all t > 0, is given by 
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Zt+1 := '^fli Ci; where G No are i.i.d. random variables having E [(] =: p. 

A branching process can be thoiight of as a popiilation of identical individ- 
uals, where each individual survives exactly one generation. Each individual 
produces ^ offspring independently of the rest of the population during its life- 
time, where ^ is a random variable with expectation p. The random variable Zt 
denotes the population size in generation t. Clearly, if Zt = for some t, then 
Zf = for all t' > t. The following lemma gives a simple bound on the size of 
the population after t> 1 generations. 

Lemma 8. Let Zq^Zi,... be a single-type branching process with Zq := 1 and 
mean number of offspring per individual p. Define random variables T := 
m.m{t > I Zt =0}, i.e. the extinction time, and Xt the number of differ- 
ent lineages until generation t. Then for any t,k>l, 

Pr {Zt >k)<!--, and Pr {T >t) < pK 
k 

Furthermore, if p < I, then 

E [Xt] < and Pr {Xt > k) < 

1-p ' k{l-p) 

Proof. By the law of total expectation, we have 

E [Zt] = E [E [Zt I Zt-i]] = p • E [Zt-i] . 

Repeating this t times gives E [Zt] = p* • E [Zq] . The first part of the lemma 
now follows by Markov's inequality, i. e. 

fc k 

The second part of the lemma is a special case of the first part for A: = 1, i.e. 
Pr(r >t) = Py:{Zt > 1) < p*. For the last two parts, note that since each 
lineage must contain at least one individual that is unique to that lineage, we 
have Xt < Zi+- ■ --{-Zt. By linearity of expectation and the previous inequalities, 
we can therefore conclude that 



E 



[Xt]<J2^[Z,]<^p^ = - 



Finally, it follows from Markov's inequality that 

Pr(X(>fc)< ^ 



k{l-py 

□ 

From the preceding lemma, it is clear that the expected number of offspring 
p is important for the fate of a branching process. For p < 1, the process is called 
sub- critical, for p = 1, the process is called critical, and for p > 1, the process 
is called super-critical. In this paper, we will consider sub-critical processes. 
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4.4 Too High Selection Pressure 

In this section, it is proved that SELPRESo-^a ^ is hard for Linear Ranking EA 
when the ratio between parameters rj and x is sufficiently large. The overall 
proof idea is first to show that the population is likely to reach the equilibrium 
position before the optimum is reached (Proposition [TUl and Theorem O. Once 
the equilibrium position is reached, a majority of the population will have sig- 
nificantly more than (cr + d)n leading 1-bits, and individuals that are close to 
the optimum are therefore less likely to be selected (Proposition [TT]) . 

The proof of PropositionlTOlbuilds on the result in Proposition[9l which states 
that the individuals with at least fc + 3 leading 1-bits will quickly dominate the 
population. Hence, family trees of individuals with less than A; -I- 3 leading 1-bits 
are likely to become extinct before they discover an optimal search point. Recall 
that optimal search points have fc-|-3 leading 0-bits. In the following, individuals 
with at least fc + 3 leading 1-bits will be called I'^+'^-individuals. 

Proposition 9. Let 7* be any constant < 7* < 1, and t{X) = poly(A). // 
the Linear Ranking EA with population size X,n < X < , for any constant 
integer k > \, and hit-wise mutation rate xl^ fof h constant x > 0, is applied 
to SELPREScr.s^fc, then with probability 1 — o(l), all the j*-ranked individuals 
in generation log A to generation T* min{t(A),T — 1} are 1^^^ -individuals, 
where T is the number of generations until the optimum has been found. 

Proof. If the 7*-ranked individual in some generation t^ < log A is an l*''+'^-individual, 
then by the first part of Theorem [3] with parameter (fc + 3)/n, the 7*- 

ranked individual remains so until generation T* with probability 1 — e~^^^'. 
Otherwise, we consider the run a failure. 

It remains to prove that the 7*-ranked individual in one of the first log A 
generations is an I'^+^-individual with probability 1 — o(l). We apply the drift 
theorem with respect to the potential function log(A^), where A"^ is the number 
of l'"'+'^-individuals in the population. 

A run is considered failed if the fraction of l'"'+'^-individuals in any of the 
first T* generations is less than 70 := 1/2'^+''. The initial generation is sampled 
uniformly at random, so by a ChernofF bound, the probability that the fraction 
of I'^+'^-individuals in the initial generation is less than 70, is e"^^'*'''. Given 
that the initial fraction of I'^'+'^-individuals is at least 70, it follows again by the 
first part of Theorem [3] with parameter S,o — {k -\- 3)/n that this holds until 
generation T* with probability 1 — e^^^'*'^ Hence, the probability of this failure 
event is e"^'"^^. 

The l*''+'^-individuals are fitter than any other non-optimal individuals. As- 
sume that the fraction of l'°+^-individuals in a given generation is 7,70 < 7 < 
7*. In order to create a I'^+'^-individual in a selection step, it suffices to select 
one of the best 7A individuals, and to not mutate any of the first fc + 3 bit 
positions. The expected number of l'"' "'"'^-individuals in the following generation 
is therefore at least r{'y)X, where we define r(7) :— /3(7)(1 — x/n)'^+^. The ratio 
is linearly decreasing in 7, and for sufficiently large n, strictly larger than 
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1 + c, where c > is a constant. Hence, for all 7 < 7*, it holds that 

K7)=7^>7^>7(l + c). 

The drift is therefore for all 7, where 70 < 7 < 7*, 

A > log(r(7)A) - log(7A) 

> log(7(l + c)A) - log(7A) log(l + c). 

Assuming no failure, the potential must be increased by no more than b{X) := 
log(7*A) — log(7oA) = log(7*/7o). By the drift theorem, the expected number 
of generations until this occurs is 6(A) /A = 0(1). And the probability that this 
does not occur within log A generations is (9(1/ log A) by Markov's inequality. 

Taking into account all the failure probabilities, the proposition now follows. 

□ 

Proposition 10. For any constant r > 0, the probability that the Linear Rank- 
ing EA with population size A, n < A < , for some constant integer k >\, and 
bit-wise mutation rate x/^ fof 0, constant x > 0, has not found the optimum of 
SelPreScS.A: within \rn^ function evaluations is ri(l). 

Proof. We consider the run a failure if at some point between generation log A 
and generation rn^ , the (1 + (5)/2-ranked individual has less than fc + 3 leading 
1-bits without first finding the optimum. By Proposition [HI the probability of 
this failure event is o(l). 

Assuming that this failure event does not occur, we apply the method of non- 
selective family trees with the set of I'^'+'^-individuals as core. Recall that the 
family trees are pruned such that they only contain lineages outside the core. 
However, to simplify the analysis, the family trees will not be pruned before 
generation log A. Therefore, any family tree that is not rooted in an l'^+'^- 
individual, must be rooted in the initial population. The proof now considers 
the family trees with roots after and before generation log A separately. 

Case 1: We firstly consider the at most m :— Xrn^ < rn^^"^ family trees 
with roots after generation log A. We begin by estimating the total number of 
lineages, and their extinction times. The mean number of offspring p, of an 
individual with rank 7, is no more than Q!(7), as given in Eq. ([T]). Assuming 
no failure, any non-optimal individual outside the core has rank at least 7 := 
(1 -|- 5)/2. Hence for any selection pressure 77, 1 < 77 < 2, the mean number of 
offspring of an individual in the family tree is, p < a{{l + 5) /2) — 1— (77— 1)(5 < 1. 
We consider the run a failure if any of the m family trees survives longer than 
t :— {k -\- 3) ln?T,/ln(l/p) generations. By the union bound and Lemma [H the 
probability of this failure event is no more than mp* = mn^^^^ = 0(1 /n). 

Let the random variable Pi be the number of lineages in family tree i,l < 
i < m. The expected number of lineages in a given family tree is by Lemma [8] 
no more than p/(l — p). We consider the run a failure if there are more than 
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2mp/{l — p) lineages in all these family trees. The probability of this failure 
event is by Markov's inequality no more than 

Pr Vp, > < l^lA^l^ < 1/2. 

We now bound the probability that any given lineage contains a 0'^+^- 
individual, which is necessary to find an optimal search point. The probability 
of flipping a given bit during t generations is by the union bound no more than 
tx/n, and the probability of flipping k + 3 bits within t generations is no more 
than (tx/n)'''^^ . The probability that any of the at most 2mp/{l — p) lineages 
contains a O'^'^'^-individual is by the union bound no more than 

Hx/nr^'^mp ^ 
1 -P 

Case 2: We secondly consider the family trees with roots before generation 
log A. In the analysis, we will not prune these family trees during the first log A 
generations. However, after generation log A, the family trees will be pruned 
as usual. This will only overestimate the extinction time of the family trees. 
Furthermore, there will be exactly A such family trees, one family tree for each 
of the A randomly chosen individuals in the initial population. 

We now bound the number of lineages in these family trees, and their extinc- 
tion times. The mean number of offspring is no more than 77 < 2 during the first 
log A generations. Because the family trees are pruned after generation log A, 
we can re-use the arguments from case 1 above to show that the mean number 
of offspring after generation log A is no more than p, for some constant p < 1. 
Let random variable Zt be the number of family tree members in generation 
Zt- Analogously to the proof of Lemma 13 we have E [Zi] < 2* if i < log A, and 
F,[Zt] < 2i°s^p*-i°g^ = A/9*-'°s^ for t > log A. We consider the run a failure 
if any of the A family trees survives longer than ^/n generations. By the union 
bound and Markov's inequality, the probability of this failure event is no more 
than AE [Z^] = e-"(v^). 

Let the random variable Pi be the number of lineages in family tree i,l < 
i < X. Similarly to the proof of Lemma [8l the expected number of different 
lineages in the family tree is no more than 

log A 00 , 

E [P,] < g E [Zt] + J2 E[Zt] <2A + ^ -0(A). 

t=l t=logA+l ^ 

We consider the run a failure if there are more than A"^ lineages in all family 
trees. By Markov's inequality, the probability of this failure event is no more 
than 

Pr (j2 P^>A<J2^ m /A' = 0(1/A). 

/ i=l 
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We now bound the probability that a given Uneage finds an optimal search 
point. Define <j' := a — 8 — [k + 4)/n. To find the optimum, it is necessary that 
all the bits in the interval of length cr'n, starting from position fc + 4, are 1-bits. 
We consider the run a failure if any of the individuals in the initial population 
has less than a'n/S 0-bits in this interval. By a Chernoff bound and the union 
bound, the probability of this failure event is no more than Ae^^*^"^ — e^^^^"-'. 

The probability of flipping a given 0-bit within ^Jn generations is by the 
union bound no more than xl \f^- Hence, the probability that all of the at 
least cr'n/S 0-bits have been flipped is less than ijc/V^Y "^'^ = rt^^'^"^. The 
probability that any of the at most A'^ lineages finds the optimum within ^/n 
generations is by the union bound no more than A'^rt^^^^") — n^^^^") . 

If none of the failure events occur, then no globally optimal search point has 
been found during the first rn^ generations. The probability that any of the 
failure events occur is by union bound less than 1/2-1- o(l). The proposition 
therefore follows. □ 

Once the equilibrium position has been reached, we will prove that it is hard 
to obtain the global optimum. We will rely on the fact that it is necessary to 
have at least Sn/'i 0-bits in the interval from (cr -I- 5)n to (cr -I- 2(5)n, and that 
any individual with a 0-bit in this interval will be ranked worse than at least 
half of the population. 

Proposition 11. Let a and 6 be any constants that satisfy 0<5<a<l — 3(5. 
// the Linear Ranking EA with population size \, where n < \ < , for any 
constant integer k > 1, with selection pressure rj and constant mutation rate 
X > satisfying rj > {2e^'-'^+^^^ - 1)/(1 - S) is applied to SELPREScr,5,fc, and the 
(1 + 5)/2-ranked individual reaches at least {a + 2d)n leading 1-bits before the 
optimum has been found, then the probability that the optimum is found within 
e*^" function evaluations is e^^^*-"-*, for some constant c > 0. 

Proof. Define 7 (1 + S)/2, and note that 

7 2 

Hence, we have 

r ln(/3(7)/7)/x >a + 3S. (3) 

Let ^0 ■= cr + 26 — C — S. Again, we apply the technique of non-selective 
family trees and define the core as the set of search points with more than ^o*^ 
leading 1-bits. By the first part of Theorem[21 the probability that the 7-ranked 
individual has less than leading 1-bits within e'^" generations is e"^*^"^ for 
sufficiently small c. If this event does happen, we say that a failure has occurred. 
Assuming no failure, each family tree member is selected in expectation less than 
p < a((l -I- S)/2) = 1 — (77 — 1)6 < 1 times per generation. 

We first estimate the extinction time of each family tree, and the total num- 
ber of lineages among the at most m :— Ae^" family trees. The reproductive 
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rate is bounded from above by a constant p < 1. Hence, by Lemma |51 the 
probability that a given family tree survives longer than t :— 2cn/\ii{l/p) gen- 
erations is p* = e"^'^". By union bound, the probability that any family tree 
survives longer than t generations is less than Ae~^'^", and we say that a failure 
has occurred if a family tree survives longer than t generations. For each i, 
where 1 < i < m, let the random variable Pi denote the number of lineages in 
family tree i. By Lemma [8] and Markov's inequality, the probability that the 
number of lineages in all the family trees exceeds e^'^"p/(l — p), is 



If this happens, we say that a failure has occurred. 

We then bound the probability that any given member of the family tree 
is optimal. To be optimal, it is necessary that there are at least 5n/3 0-bits 
in the interval from 1 to £,Qn. We therefore optimistically assume that this is 
the case for the family tree member in question. However, none of these 0-bits 
must occur in the interval from bit position fc + 4 to bit position (cr — 6)n, 
otherwise the family tree member is not optimal. The length of this interval 
is {a — 5 — o(l))n = f2(n). Since the family tree is non-selective, the positions 
of these 0-bits are chosen uniformly at random among the ^0*^ bit positions. 
In particular, the probability of choosing a 0-bit within this interval, assuming 
no such bit position has been chosen yet, is at least J7(n)/^on > c', for some 
constant c' > 0. And the probability that none of the at least 6n/3 0-bits are 
chosen from this interval is no more than (1 — c')^"/"^ = e~^'"'. 

There are at most t family tree members per lineage. The probability that 
any of the te^'^"(o/(l — p) < e^^" family tree members is optimal is by union 
bound no more than g3cn^-n{n) _ g-o(n)^ assuming that c is a sufficiently small 
constant. Taking into account all the failure probabilities, the probability that 
the optimum is found within e'^" generations is e"^'-"-', for a sufficiently small 
constant c > 0. □ 

By combining the previous, intermediate results, we can finally prove the 
main result of this section. 

Theorem 12. Let a and S be any constants that satisfy < S < a < 1 — 3(5. The 

expected runtime of the Linear Ranking EA with population size X,n < X < n'^ , 
for any integer k > 1, and selection pressure rj and constant mutation rate x > 
satisfying r] > (2ex('^+3'5) _ ^^(^i _ ^n{n) ^ 

Proof Define 7 (1 S)/2 and T — M(3il)/l)/x- By Eq. © in the 
proof of Proposition [TTl it holds that ^* — 6 > a + 26. By Theorem [5] and 
Markov's inequality, there is a constant probability that the 7-ranked individual 
has reached at least (^* — S)n > {a + 2S)n leading 1-bits within rn^ generations, 
for some constant r. By Proposition [TOl the probability that the optimum has 
not been found within the first rn^ generations is $7(1). If the optimum has not 
been found before the 7-ranked individual has (cr -|- 2S)n leading 1-bits, then 




24 



by Proposition [TTJ the expected runtime is e^'-"-'. The unconditional expected 
runtime of the Linear Ranking EA is therefore e^*^"-*. □ 

4.5 Too Low Selection Pressure 

This section proves an analogue to Theorem [12] for parameter settings where 
the equilibrium position n(\nr])/x is below (a — d)n. i.e., it is shown that 
SelPreSct,5,/c is also hard when the selection pressure is too low. To prove this, 
it suffices to show that with overwhelming probability, no individual reaches 
more than nhi{r]K(j)) /x leading 1-bits in exponential time, for appropriately 
chosen constants k, </> > 1. Again, we will apply the technique of non-selective 
family trees, but with a different core than in the previous section. The core 
is here defined as the set of search points with prefix sum less than nln{r]K)/x, 
where the prefix sum is the number of 1-bits in the first n ln(?7K(/)) / x bit positions 
of the search point. Clearly, to obtain at least nln{riK(j)) / x leading 1-bits, it is 
necessary to have prefix sum exactly nln{riK(j)) / x- We will consider individuals 
outside the core, i.e., the individuals with prefix sums in the interval from 
nln{riK)/x to nlTi{riK(j))/x- Note that choosing n and to be constants slightly 
larger than 1 implies that this interval begins slightly above the equilibrium 
position nln{ri)/x given by Theorem [3] (see Fig. [SJ. 

Single-type branching processes are not directly applicable to analyse this 
drift process, because they have no way of representing how far each family tree 
member is from the core. Instead, we will consider a more detailed model based 
on multi-type branching processes (see e.g. Haccou et al. [IQj). Such branching 
processes generalise single-type branching processes by having individuals of 
multiple types. In our application, the type of an individual corresponds to 
the prefix-sum of the individual. Before defining and studying this particular 
process, we will describe some general aspects of multi-type branching processes. 

Definition 13 (Multi-Type Branching Process [10]). A multi-type branching 
process with d types is a Markov process Zq, Zi, ... on Nq, which for all t > 0, 
is given by 

d ^tj 
3=1 4=1 

where for all j, 1 < j < d, ^p-* € Nq are i.i.d. random vectors having expectation 
E [^^■'^] =: {mji,mj2, ...,mjd)^ ■ The associated matrix M :— {mjk)dxd is called 
the mean matrix of the process. 

Definition [13] states that the population vector Zt+i for generation t -\- \ 
is defined as a sum of offspring vectors, one offspring vector for each of the 
individuals in generation t. In particular, the vector element Ztj denotes the 
number of individuals of type j, 1 < j < d, in generation t. And denotes the 
offspring vector for the i-th individual, 1 < i < Znj, of type j. The fc-th element, 



25 



1 < fc < d, of this offspring vector Q'' represents the number of offspring of type 
k this individual produced. 

Analogously to the case of single-type branching processes, the expectation 
of a multi-type branching process Zt>o with mean matrix M follows 

E [ZtV = E [E [Zt I Zt-i]V - E [Zi_i]^A/ = E [ZoV MK 

Hence, the long-term behaviour of the branching-process depends on the matrix 
power M*. Calculating matrix powers can in general be non-trivial. However, if 
the branching process has the property that for any pair of types i, j, it is pos- 
sible that a type j-individual has an ancestor of type z, then the corresponding 
mean matrix is irreducible j26j . 

Definition 14 (Irreducible matrix 26J). A d d non-negative matrix M is 
irreducible if for every pair i,j of its index set, there exists a positive integer t 
such that rnf^ > 0, where rnf- are the elements of the t-th matrix power Af*. 

If the mean matrix M is irreducible, then Theorem [15] implies that the 
asymptotics of the matrix power M* depend on the largest eigenvalue of M. 

Theorem 15 (Perron- Frobenius [lOp. If M is an irreducible matrix with non- 
negative elements, then it has a unique positive eigenvalue p, called the Perron 
root of M , that is greater in absolute value than any other eigenvalue. All 
elements of the left and right eigenvectors u = (ui, u^)^ o-nd v = {vi, Vd)^ 
that correspond to p can be chosen positive and such that Uk — I and 

Z]fe=i WfcWfc — 1. In addition, 

Ar ^ p'' ■ A + B'\ 

where A ~ {viUj)f and B are matrices that satisfy the conditions 

1. AB = BA = 

2. There are constants pi € (0, p) and C > such that none of the elements 
of the matrix _B" exceeds Cp^ . 

A central attribute of a multi-type branching process is therefore the Perron 
root of its mean matrix M, denoted p{M). A multi-type branching process with 
mean matrix M is classified as sub-critical if p{M) < 1, critical if p{M) — 1 
and super- critical if p{M) > 1. Theorem 1151 implies that any sub-critical multi- 
type branching process will eventually become extinct. However, to obtain good 
bounds on the probability of extinction within a given number of generations t 
using Theorem 1151 one also has to take into account matrix A that is defined 
in terms of both the left and right eigenvectors. Instead of directly applying 
Theorem 1151 it will be more convenient to use the following lemma. 
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Lemma 16 (|10)). Let Zq,Zi,... be a multi-type branching process with irre- 
ducible mean matrix M = {mij)dxd- If the process started with a single individ- 
ual of type h, then for any k > and t > 1, 



Pr \ J2Zt,>k\Zo = en \ < 



pjMY Vh 

k V* ' 



where Ch,^ < h < d, denote the standard basis vectors, p{M) is the Perron root 
of M with the corresponding right eigenvector v, and v* := mini<i<c; Ui. 

Proof. The proof follows (TUl P- 122]. By Theorem [T51 matrix M has a miique 
largest eigenvalue p(M), and all the elements of the corresponding right eigen- 
vector V are positive, implying v* > 0. The probability that the process consists 
of more than k individuals in generation t, conditional on the event that the 
process started with a single individual of type /i, can be bounded as 

Pr I ^^t, > fc I ^0 = eJ = Pr ( E^*^^* ^ I ^" = ^'^ 



d 



< Pr I ^ ZtjV.j > kv* \ Zo = eh 

\3 = ^ 



Markov's inequality and linearity of expectation give 



Pr ^ ZtjVj >kv*\Zo = eh \ < E 



ZtjVj I Zo 



1 

kv* 



As seen above, the expectation on the right hand side can be expressed as 

E [Zt I Zo - chf = E [Zo I Zo = eJ^M*. 

Additionally, by taking into account the starting conditions, Zq^ — 1 and Zoj 
0, for all indices j ^ h, this simplifies further to 



d d 



Y,^[Zt,\Zo = eh]-^ = J2Y.^[Zo,\Zo = eh] 



3 = 1 *=1 



kv* 



Finally, by iterating 



M\ = M^-\Mv) = p{M) ■ M^~\, 
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Branching 
Process 
Type 



Figure 5: Multi-type Branching Process Model in Theorem [111 The prefix- 
sum of an individual is the number of 1-bits in the first nhi{riK(f))/x bit- 
positions. The population core contains all individuals with prefix sum lower 
than nln(r]K)/x, which is slightly above the equilibrium value of nl'a{r])/x from 
Theorem [31 The multi-type branching process considers individuals outside 
the core, where the type of an individual is given by the number of 0-bits in 
the first n\n(riK(j))/x bit-positions. The probability that an offspring of a type 
i-individual is a type j-individual, is pij. 



which on coordinate form gives 

d 

one obtains the final bound 

□ 

We will now describe how to model a non-selective family tree outside the 
core as a multi-type branching process (see Fig. [5]). Recall that the prefix sum of 
a search point is the number of 1-bits in the first n \n.{'qK(j)) / x bit positions of the 
search point, and that the core is defined as all search points with prefix-sum less 
than n\Ti(r]K)/x 1-bits. The process has n(]n.(j))/x types. A family tree member 
has type i if its prefix sum is n \n{riK(j)) /x^* • The element of the mean matrix 
A of this branching process represents the expected number of offspring a type 
i-individual gets of type ^'-individuals per generation. Since we are looking for 
a lower bound on the extinction probability, we will over-estimate the matrix 
elements, which can only decrease the extinction probability. By the definition 
of linear ranking selection, the expected number of times during one generation 
in which any individual is selected is no more than -q. We will therefore use 
flij" = V • P^j^ where pij is the probability that mutating a type i-individual 
creates a type j-individual. To simplify the proof of the second part of Lemma 
[T8l we overestimate the probability pij to for the indices i and j where 

j — i > 21ogn + 1. Note that the probability that none of the first n\n{riK) /x 
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bits are flipped is less than exp(— Iii(77k)) — l/rjK. In particular, this means 
that rj ■ Pa < rj/rjK — \/ k := an. The full definition of the mean matrix is as 
follows. 

Definition 17 (Mean Matrix A). For any integer n > 1 and real numbers 
77,X, 0, 'tis where < Xi 1 ^ ^7 o*^^ \ < (p < k < e, define the n\n{(j))/x x 
n\n{<j))/x matrix A — {aij) as 

rj/ri^ if 2 log n + 1 < j — i, 

V ■ (" '"(;_t^/^) • 1 < J - * < 2 log n, 

1/k if i = j, and 

In order to apply Lemma[T6]to mean matrix A defined above, we first provide 
upper bounds on the Perron root of A and on the maximal ratio between the 
elements of the corresponding right eigenvector. 

Lemma 18. For any integer n> 1, and real numbers 77, 1 < 77 < 2, x > 0, and 
e > 1, there exist real numbers k and (j), 1 < (f> < k < e, such that matrix A 
given by Definitional^ has Perron root bounded from above by p{A) < c for some 
constant c < 1. Furthermore, for any h, I < h < nhi{(j))/x, the corresponding 
right eigenvector v, where v* := min^ Vi, satisfies 

nln(</>)/x-'i 

Yh. < 2"ln(«/x .11' 



V \X 

Proof. Set K := e. Since a^ > for all matrix A is by Definition 1141 
irreducible, and Theorem [T5] applies to the matrix. Expressing the matrix as 
A = 1/k • / + i?, where B := A — 1/k • /, and / is the identity matrix, the Perron 
root is p{A) = 1/k + 

The Frobcnius bound for the Perron root of a non-negative matrix M = 
imij) states that p(Af) < maxj Cj(M) [1^, where Cj(M) :— is the j-th 

column sum of M. However, when applied directly to our matrix, this bound 
is insufficient for our purposes. Instead, we can consider the transformation 
SBS~^, for an invertible matrix 

S := diag(xi,X2, ...,x„in(0)/x)- 

To see why this transformation is helpful, note that for any matrix A with the 
same dimensions as S, we have det(S'A5'~^) = det{A). So if p is an eigenvalue 
of B, then 

= det(B - pi) 
^ det{S{B - pI)S-^) 
= det{SBS-'^ - pi), 
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and p must also be an eigenvalue of SBS^^ . It follows that p{B) = p{SBS~^). 
We will therefore apply the Frobenius bound to the matrix SBS~^, which has 
off-diagonal elements 

{SBS = Gij 



Define Xi := where 

hi{riKq 



q :-- 



ln(l + l/r?7)' 



for some constant r>l/(r/ — 1)>1 that will be specified later. Since r} = l + c 
for some c > 0, the constant q is bounded as 

ln?7 In 77 In 77 

* ^ ln(l + ^)'^ ln(2 - i) = In,, + ln(| - ^) > " 

The sum of any column j can be bounded by the three sums 

j-2 logn-1 

> aij ■ — <n- = -, 



3-1 i-1 



i=j-2\ogn ■' 1=1 ^ ^ 



i=l 



U-i) 



(ln(7?K0) /g) 



fe 



k=l 

T] ■ {ex.p{ln{riK^)/q) — 1), and 



nln(0)/x . nln{4>)/\ , . ^ 

Xi 1 / i \ 



E «^.-- = -- E •(-) 

^l_"^"^^/Vnln(0)/xy/xy-.^. 



-•E 



- • (exp(gln?i) - 1). 

K 
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21ogn 



la 


1 


2a 




3a 






lb 




2b^ 




3b 



n ln((/)) /x — 2 log n 
Figure 6: Structure of matrix A in Definition 1 171 



The Perron root of matrix A can now be bounded by 
p{A) < - +in&xCj{SBS-^) 



< 



1 

K 

V 



nln(0)/x 

max a 



1) 



n 

1 
n 



1 

K 



exp(g In < 



r/ • (exp(ln(7/K0)/g) 

1 (hi 
- + —. 

r K 

Choosing (j) sufficiently small, such that 1 < < n^^'^i, and defining the constant 
""■=1^1' > ^/('^ " ^^^^ 

1 

r K 



P{A)< 



< 



2V^ 

1 1 

2 ^ 



1 



< 1. 



The second part of the lemma involves for any h, to bound the ratio Vh/v* 
where v is the right eigenvector corresponding to the eigenvalue p. In the special 
case where the index h corresponds to the eigenvector element with largest 
value, this ratio is called the principal ratio. By generalising Mine's bound for 
the principal ratio |19| . one obtains the upper bound 



Vh_ 

V* 



max — 

k Vk 



PVh 

max 

fe pVk 



max < max ■ 
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It now suffices to prove that the matrix elements of A satisfy 

/ \^ n ln(0)/x — 

To prove that these inequalities hold, we first find a lower bound a* on the 
minimal element along any column, i.e. min^ a^j > a*, for any column index 
j. As illustrated in Fig. |6l the matrix elements of A can be divided into six 
cases according to their column and row indices. For case la and lb, where 
21ogn + l < j-fc <7iln(0)/x, 

1 

TV 

For case 2a and 2b, where < j — fc < 2 logn, 

^ \n) ^ In. 
For case 3a and 3b, where k > j, 



Hence, we can use the lower bound 



■ij^ynnW/x-J ifj<„in(0)/;^_21ogn, and 
(n) otherwise. 



We then upper bound the ratio ahj/a* for all column indices j. All elements 
of the matrix satisfy ahj < fj- Therefore, in case lb, 2b and 3b, where j > 
n\n{(f>)/x — 21ogn, 



«1 - Vx 



2 log n 



In case la and 2a, where h < j < nln((/))/x — 21ogn, 

^ - [x) - [xj 

Finally, in case 3a, where j < h and j < n\n{(j))/x — 21ogn, 
a* - K\h-j) \n) \x 

n\n(4>)/x-h 

< 2n\n(4>)/x . [ '1' 



.Xj 

The second part of the lemma therefore holds. □ 
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Having all the ingredients required to apply Lemma to the mean matrix 
in Definition 1171 we are now ready to prove the main technical result of this 
section. Note that this result implies that Conjecture 1 in jTSj holds. 

Theorem 19. For any positive constant e, and some positive constant c, the 
probability that during e"^" generations, Linear Ranking EA with population size 
A = polyin), selection pressure rj, and mutation rate xI^j there exists any 
individual with at least n((ln?7)/x + e) leading 1-bits is e"^^"-'. 

Proof. In the following, k and 4> are two constants such that (lnK + ln0)/x = e, 
where the relative magnitudes of k and 4> are as given in the proof of Lemma [T51 

Let the prefix sum of a search point be the number of 1-bits in the first 
n\n{r]n(f))/x bits. We will apply the technique of non-selective family trees, 
where the core is defined as the set of search points with prefix sum less than 
n\T\(rjK)/x 1-bits. Clearly, any non-optimal individual in the core has fitness 
lower than nhi{rjK)/x- 

To estimate the extinction time of a given family tree, we consider the multi- 
type branching process Z{),Zi,... having nh\[(j))/x types, and where the mean 
matrix A is given by Definition [iTl Let the random variable St '■= ^"J"^'^)/'^ Zu 
be the family size in generation t. By Lemma [TBI and Lemma [T8l it is clear that 
the extinction probability of the family tree depends on the type of the root 
of the family tree. The higher the prefix sum of the family root, the lower the 
extinction probability. The parent of the root of the family tree has prefix sum 
lower than n hi{riK) /x, hence the probability that the root of the family tree has 
type h, is 

By Lemma [16] and Lemma 1181 the probability that the family tree has more 
than k members in generation t is for sufficiently large n and sufficiently small 
bounded by 

Pr (St > k) 

nln(0)/x f i'i\ri(4>) / X 

= J2 = e^) • Pr J2 >k\Zo = eh 

h=l \ J=l 



< 



"'"(t^/'^ / nln(^)/x \ ^ /x\"i"(*)/x-'« ^ p[A)l ^ v_h 
^ \n\n{(j))/x- h) \n) k v* 

nln{4>)/x 



^ h=0 V ^ / 



_ 92nln(0)/x Pi^Y 
' k ■ 

By Lemma 1181 the Perron root of matrix A is bounded from above by a con- 
stant p{A) < 1. Hence, for any constant w > 0, the constant (j) can be 
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chosen sufficiently small such that for large n, the probability is bounded by 
Pr{St >k)< piAy-'^^/k. 

For fc = 1 and w < 1, the probability that the non-selective family tree is 
not extinct in n generations, i.e., that the height of the tree is larger than n, 
is p{A)^^"'^ = e"^'-"'. Furthermore, the probability that the width of the non- 
selective family tree exceeds k — /9(A) in any generation is by union bound 
less than np(A)™" = e'^^"). 

We now consider a phase of e"^" generations. The number of family trees 
outside the core during this period is less than Ae"^". The probability that 
any of these family trees survives longer than n generations, or are wider than 
is by union bound Ae'^" • (e-"(") + e-"(")) = e'^^") for a sufficiently 
small constant c. The number of paths from root to leaf within a single family 
tree is bounded by the product of the height and the width of the family tree. 
Hence, the expected number of different paths from root to leaf in all family 
trees is less than Ae™np(A)"2«>« ^he probability that it exceeds e2^"/9(A)-2'"" 
is by Markov's inequality Ae™ne~^'^" = e^^("). 

The parent of the root of each family tree has prefix sum no larger than 
nln{r]K)/x- In order to reach at least nln(riK4>)/x leading 1-bits, it is therefore 
necessary to flip nhi{(f))/x 0-bits within n generations. The probability that a 
given 0-bit is not flipped during n generations is (1— x/n)" > p for some constant 
p > 0. Hence, the probability that all of the n\n{(j>)/x 0-bits are flipped at least 
once within n generations is no more than — e~'^ " for some constant 

c' > 0. Hence, by union bound, the probability that any of the paths attains 
at least \n{r]K(j))/x leading 1-bits is less than e^''"p(A)~^'""e~'^ " = g-^Mn) for 
sufficiently small c and w. □ 

Using Theorem [191 it is now straightforward to prove that SELPREScr,5,fc is 
hard for the Linear Ranking EA when the ratio between the selection pressure 
77 and the mutation rate x is too small. 

Corollary 20. The probability that Linear Ranking EA with population size 
A — poly(n), hit-wise mutation rate xj^j '"^'^ selection pressure rj satisfying 
T] < exp(x((T — S)) — e for any e > 0, finds the optimum of SELPRESo-.a^fe within 
e""' function evaluations is e"^*-"-*, for some constant c > 0. 

Proof. In order to reach the optimum, it is necessary to obtain an individual 
having at least n{a~S) leading 1-bits. However, by TheoremHH the probability 
that this happens within generations is e"^^"-* for some constant c > 0. □ 

5 Conclusion 

The aim of this paper has been to better understand the relationship between 
mutation and selection in EAs, and in particular to what degree this relationship 
can have an impact on the runtime. To this end, we have rigorously analysed the 
runtime of a non-elitist population-based EA that uses linear ranking selection 
and bit-wise mutation on a family of fitness functions. We have focused on 
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two parameters of the EA, 77 which controls the selection pressure, and x which 
controls the bit- wise mutation rate. 

The theoretical results show that there exist fitness functions where the 
parameter settings of selection pressure rj and mutation rate x have a dramatic 
impact on the runtime. To achieve polynomial runtime on the problem, the 
settings of these parameters need to be within a narrow critical region of the 
parameter space, as illustrated in Fig. [2] An arbitrarily small increase in the 
mutation rate, or decrease in the selection pressure can increase the runtime of 
the EA from a small polynomial (i.e. highly efhcient), to exponential (i.e. highly 
inefficient). The critical factor which determines whether the EA is efficient on 
the problem is not individual parameter settings of 77 or x^ but rather the ratio 
between these two parameters. A too high mutation rate x can be balanced by 
increasing the selection pressure 77, and a too low selection pressure rj can be 
balanced by decreasing the mutation rate x- Furthermore, the results show that 
the EA will also have exponential runtime if the selection pressure becomes too 
high, or the mutation rate becomes too low. It is pointed out that the position 
of the critical region in the parameter space in which the EA is efficient is 
problem dependent. Hence, the EA may be efiicient with a given mutation rate 
and selection pressure on one problem, but be highly inefficient with the same 
parameter settings on another problem. There is therefore no balance between 
selection and mutation that is good on all problems. The results shed some 
light on the possible reasons for the difficulty of parameter tuning in practical 
applications of EAs. The optimal parameter settings can be problem dependent, 
and very small changes in the parameter settings can have big impacts on the 
efficiency of the algorithm. 

Informally, the results for the functions studied here can be explained by 
the occurrence of an equilibrium state into which the non-elitist population en- 
ters after a certain time. In this state, the EA makes no further progress, even 
though there is a fitness gradient in the search space. The position in the search 
space in which the equilibrium state occurs depends on the mutation rate and 
the selection pressure. When the number of new good individuals added to the 
population by selection equals the number of good individuals destroyed by mu- 
tation, then the population makes no further progress. If the equilibrium state 
occurs close to the global optimum, then the EA is efficient. If the equilibrium 
state occurs far from the global optimum, then the EA is inefficient. The results 
are theoretically significant because the impact of the selection-mutation inter- 
action on the runtime of EAs has not previously been analysed. Furthermore, 
there exist few results on the runtime of population-based EAs, in particular 
those that employ both a parent and an offspring population. Our analysis an- 
swers a challenge by Happ et al. [TT], to analyse a population-based EA using 
a non-elitist selection mechanism. Although this paper analyses selection and 
mutation on the surface, it actually touches upon a far more fundamental is- 
sue of the trade-off between exploration (driven by mutation) and exploitation 
(driven by selection). The analysis presented here could potentially by used to 
study rigorously the crucial issue of balancing exploration and exploitation in 
evolutionary search. 
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In addition to the theoretical results, this paper has also introduced some 
new analytical techniques to the analysis of evolutionary algorithms. In partic- 
ular, the behaviour of the main part of the population and stray individuals are 
analysed separately. The analysis of stray individuals is achieved using a con- 
cept which we call non-selective family trees, which are then analysed as single- 
and multi-type branching processes. Furthermore, we apply the drift theorem 
in two dimensions, which is not commonplace. As already demonstrated in [17] . 
these new techniques are applicable to a wide range of EAs and fitness functions. 

A challenge for future experimental work is to design and analyse strategies 
for dynamically adjusting the mutation rate and selection pressure. Can self- 
adaptive EAs be robust on problems like those that are described in this paper? 
For future theoretical work, it would be interesting to extend the analysis to 
other problem classes, to other selection mechanisms, and to EAs that use a 
crossover operator. 
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